boolean function
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
Neural Sculpting: Uncovering hierarchically modular task structure in neural networks through pruning and network analysis
Natural target functions and tasks typically exhibit hierarchical modularity -- they can be broken down into simpler sub-functions that are organized in a hierarchy. Such sub-functions have two important features: they have a distinct set of inputs (input-separability) and they are reused as inputs higher in the hierarchy (reusability). Previous studies have established that hierarchically modular neural networks, which are inherently sparse, offer benefits such as learning efficiency, generalization, multi-task learning, and transfer. However, identifying the underlying sub-functions and their hierarchical structure for a given task can be challenging. The high-level question in this work is: if we learn a task using a sufficiently deep neural network, how can we uncover the underlying hierarchy of sub-functions in that task? As a starting point, we examine the domain of Boolean functions, where it is easier to determine whether a task is hierarchically modular. We propose an approach based on iterative unit and edge pruning (during training), combined with network analysis for module detection and hierarchy inference. Finally, we demonstrate that this method can uncover the hierarchical modularity of a wide range of Boolean functions and two vision tasks based on the MNIST digits dataset.
WARP-LUTs -- Walsh-Assisted Relaxation for Probabilistic Look Up Tables
Gerlach, Lino, Våge, Liv, Gerlach, Thore, Kauffman, Elliott, Ojalvo, Isobel
Fast and efficient machine learning is of growing interest to the scientific community and has spurred significant research into novel model architectures and hardware-aware design. Recent hard? and software co-design approaches have demonstrated impressive results with entirely multiplication-free models. Differentiable Logic Gate Networks (DLGNs), for instance, provide a gradient-based framework for learning optimal combinations of low-level logic gates, setting state-of-the-art trade-offs between accuracy, resource usage, and latency. However, these models suffer from high computational cost during training and do not generalize well to logic blocks with more inputs. In this work, we introduce Walsh-Assisted Relaxation for Probabilistic Look-Up Tables (WARP-LUTs) - a novel gradient-based method that efficiently learns combinations of logic gates with substantially fewer trainable parameters. We demonstrate that WARP-LUTs achieve significantly faster convergence on CIFAR-10 compared to DLGNs, while maintaining comparable accuracy. Furthermore, our approach suggests potential for extension to higher-input logic blocks, motivating future research on extremely efficient deployment on modern FPGAs and its real-time science applications.
Transformers with RL or SFT Provably Learn Sparse Boolean Functions, But Differently
Lyu, Bochen, Jia, Yiyang, Cai, Xiaohao, Zhu, Zhanxing
Large language models (LLMs), with the transformer architecture being their core building block, are remarkably successful across a wide range of tasks, in particular reasoning. LLMs excel in solving complex reasoning tasks by iteratively generating intermediate steps [Wei et al., 2022]-- an intriguing approach known as Chain-of-Thought (CoT). Fine-tuning has been shown to be a powerful method to enhance efficient CoT generation in LLMs, which in turn improves the multi-step reasoning performance of LLMs significantly [Wei et al., 2022, Zelikman et al., 2022, Lightman et al., 2024]. A widely adopted approach for fine-tuning to generate CoT is supervised fine-tuning (SFT), where the transformers are trained to minimize a loss over pairs of inputs and labeled outputs. While straightforward, SFT is restricted by the demand of a large amount of labeled CoT data. As a result, fine-tuning approaches based on reinforcement learning (RL) [DeepSeek-AI et al., 2025, Ouyang et al., 2022, Bai et al., 2022, Christiano et al., 2023, Kumar et al., 2024] are increasingly prevalent. Instead of minimizing a loss over labeled CoT data, RL guides transformers to generate CoT to solve complex reasoning tasks by maximizing a reward function via policy gradient methods [Mnih et al., 2016, Schulman et al., 2017, DeepSeek-AI et al., 2025], which has shown significant potential for improving the reasoning capabilities of LLMs.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Orange County > Irvine (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
We thank the reviewers for the feedback and comments, in what follows we address specific comments made by the 1 reviewers 2 Reviewer
I do not completely understand (apart for some parts of the proofs) why refer to these functions as Graph-based. Boolean k-ary functions may be thought of as hyper-graphs. The definition shouldn't be unusual and it will be clarified to avoid any possible This is completely analogous to the standard empirical distribution for hypotheses classes. It might be helpful to summarise, ..., some basic properties of this new notion of VC dimension... ..., is there a Sauer-Shelah type upper bound on the size of the class in terms of the graph VC dimension? VC dimension entail small graph VC dimension). Shelah Lemma for graph VC dimension, indeed this is noteworthy and we should discuss this in the main text.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- North America > United States > California > Orange County > Irvine (0.14)
- North America > Canada > Quebec > Montreal (0.04)